Fix: Enforce strict 4-digit limit on JSON Unicode escapes to prevent greedy parsing #3116
+40
−16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of Changes
This PR fixes a logic issue in
AbstractJsonLexerwhere Unicode escape sequences (\uXXXX) could be parsed greedily. Previously, if a valid 4-digit unicode sequence was followed immediately by a character in the range[a-fA-F0-9], the lexer would incorrectly attempt to consume it as part of the hex sequence, leading to corruption or parsing errors.Example of failure: Input:
"\u00f3n"(intended: "ón")Previous behavior: Parsed correctly.
Input:
"\u00f3a"(intended: "óa")Previous behavior: The lexer aggressively consumed 'a' as a 5th hex digit.
Technical Details
\u, as required by the JSON specification.HEX_TABLE.currentPosition + 4 >= source.length) to preventIndexOutOfBoundsExceptionon malformed inputs ending abruptly.Tests
testUnicodeEscapeWithFollowingHexto verify that\u00f3ais correctly parsed as the characterófollowed by the charactera.